In this test run, we will perform portfolio analysis, visualization, and optimization. A portfolio is a collection of financial assets such as stocks, bonds, cash, and real estate. The purpose of a portfolio is to diversify the investments and manage risks. Portfolio optimization is the process of selecting the optimal portfolio out of a set of available portfolios to maximize returns and reduce risks.

Key Definitions:

  • Open - The opening price at which a security first trades when the stock market opens.
  • High - The highest price at which the security is traded during a regular trading session.
  • Low - The lowest price at which the security is traded during a regular trading session.
  • Close - The closing or the last price at which the security trades during a regular trading session.
  • Volume - No. of shares traded between the market open and close.
  • Adjusted Close - adjusted stock close price after considering stock splits and dividends.
In [1]:
# IMPORT LIBRARIES AND DATASETS
# Import key librares and modules 
import pandas as pd
import numpy as np

# Import datetime module that comes pre-installed in Python 
# datetime offers classes that work with date & time information
import datetime as dt
In [10]:
# Use Pandas to read stock data
stock_amzn = pd.read_csv('AMZN01.csv')
stock_amzn.head(241)
Out[10]:
Date Open High Low Close Adj Close Volume
0 2004-08-01 1.922500 2.041500 1.742500 1.907000 1.907000 3600448000
1 2004-09-01 1.911500 2.214000 1.868000 2.043000 2.043000 3523538000
2 2004-10-01 2.064500 2.135000 1.650000 1.706500 1.706500 4171368000
3 2004-11-01 1.711000 2.075000 1.707500 1.984000 1.984000 3369498000
4 2004-12-01 2.000000 2.284000 1.899500 2.214500 2.214500 3520296000
... ... ... ... ... ... ... ...
236 2024-04-01 180.789993 189.770004 166.320007 175.000000 175.000000 917021100
237 2024-05-01 181.639999 191.699997 173.869995 176.440002 176.440002 892301700
238 2024-06-01 177.699997 199.839996 175.919998 193.250000 193.250000 813276000
239 2024-07-01 193.490005 201.199997 192.820007 195.050003 195.050003 303090400
240 2024-07-11 200.089996 200.269897 192.860001 195.050003 195.050003 43773883

241 rows × 7 columns

In [11]:
# Count the number of missing values in "stock_amzn" Pandas DataFrame
stock_amzn.isnull().sum()
Out[11]:
Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64
In [12]:
# Obtain information about the Pandas DataFrame such as data types, memory utilization..etc
stock_amzn.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 241 entries, 0 to 240
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       241 non-null    object 
 1   Open       241 non-null    float64
 2   High       241 non-null    float64
 3   Low        241 non-null    float64
 4   Close      241 non-null    float64
 5   Adj Close  241 non-null    float64
 6   Volume     241 non-null    int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 13.3+ KB

Calculating asset returns is a fundamental step in portfolio optimization. Asset returns can be calculated in several ways depending on the type of return. Simple returns (also known as arithmetic returns) are calculated as the percentage change in the price of an asset over a given period. The formula for simple returns is: Rt = (Pt - Pt-1) / Pt-1

where: Rt = Return at time t. Pt = Price of the security at time t. Pt-1 = Price of the security at time t-1.

In [13]:
# Calculate the percentage monthly return
stock_amzn['Monthly Return'] = stock_amzn['Adj Close'].pct_change(1) * 100
stock_amzn
Out[13]:
Date Open High Low Close Adj Close Volume Monthly Return
0 2004-08-01 1.922500 2.041500 1.742500 1.907000 1.907000 3600448000 NaN
1 2004-09-01 1.911500 2.214000 1.868000 2.043000 2.043000 3523538000 7.131620
2 2004-10-01 2.064500 2.135000 1.650000 1.706500 1.706500 4171368000 -16.470876
3 2004-11-01 1.711000 2.075000 1.707500 1.984000 1.984000 3369498000 16.261354
4 2004-12-01 2.000000 2.284000 1.899500 2.214500 2.214500 3520296000 11.617944
... ... ... ... ... ... ... ... ...
236 2024-04-01 180.789993 189.770004 166.320007 175.000000 175.000000 917021100 -2.982595
237 2024-05-01 181.639999 191.699997 173.869995 176.440002 176.440002 892301700 0.822858
238 2024-06-01 177.699997 199.839996 175.919998 193.250000 193.250000 813276000 9.527317
239 2024-07-01 193.490005 201.199997 192.820007 195.050003 195.050003 303090400 0.931438
240 2024-07-11 200.089996 200.269897 192.860001 195.050003 195.050003 43773883 0.000000

241 rows × 8 columns

In [16]:
# We have noticed "NaN" in the first row. Let's replace the first row with zeros instead of NaN
stock_amzn.replace({'Daily Return': 0}, inplace = True)
stock_amzn
Out[16]:
Date Open High Low Close Adj Close Volume Monthly Return
0 2004-08-01 1.922500 2.041500 1.742500 1.907000 1.907000 3600448000 0.000000
1 2004-09-01 1.911500 2.214000 1.868000 2.043000 2.043000 3523538000 7.131620
2 2004-10-01 2.064500 2.135000 1.650000 1.706500 1.706500 4171368000 -16.470876
3 2004-11-01 1.711000 2.075000 1.707500 1.984000 1.984000 3369498000 16.261354
4 2004-12-01 2.000000 2.284000 1.899500 2.214500 2.214500 3520296000 11.617944
... ... ... ... ... ... ... ... ...
236 2024-04-01 180.789993 189.770004 166.320007 175.000000 175.000000 917021100 -2.982595
237 2024-05-01 181.639999 191.699997 173.869995 176.440002 176.440002 892301700 0.822858
238 2024-06-01 177.699997 199.839996 175.919998 193.250000 193.250000 813276000 9.527317
239 2024-07-01 193.490005 201.199997 192.820007 195.050003 195.050003 303090400 0.931438
240 2024-07-11 200.089996 200.269897 192.860001 195.050003 195.050003 43773883 0.000000

241 rows × 8 columns

In [17]:
# Use the describe() method to obtain a statistical summary about the data 
# Over the specified time period, the average adjusted close price for Amazon stock was $51.159 
# The maximum adjusted close price was $195.05
# The maximum volume of shares traded on one day were 32,76,182,000
stock_amzn.describe().round(2)
Out[17]:
Open High Low Close Adj Close Volume Monthly Return
count 241.00 241.00 241.00 241.00 241.00 2.410000e+02 241.00
mean 50.52 54.24 47.31 51.16 51.16 2.248021e+09 2.46
std 57.87 61.83 54.30 58.22 58.22 1.150596e+09 10.38
min 1.33 1.55 1.29 1.34 1.34 4.377388e+07 -30.48
25% 4.62 4.87 4.16 4.66 4.66 1.432892e+09 -4.30
50% 17.00 18.29 15.88 17.73 17.73 1.888910e+09 2.30
75% 90.22 97.82 85.87 92.39 92.39 2.843062e+09 8.23
max 200.09 201.20 192.86 195.05 195.05 6.925740e+09 54.13

As we have considered a single stock (Amazon), let's perform some data visualization on the same.

In [18]:
# Matplotlib is a comprehensive data visualization library in Python 
# Seaborn is a visualization library that sits on top of matplotlib and offers enhanced features 
# plotly.express module contains functions that can create interactive figures using a few lines of code

import matplotlib.pyplot as plt

!pip install seaborn
import seaborn as sns

import plotly.express as px
Requirement already satisfied: seaborn in c:\users\ok\documents\python_working_files\lib\site-packages (0.13.2)
Requirement already satisfied: numpy!=1.24.0,>=1.20 in c:\users\ok\documents\python_working_files\lib\site-packages (from seaborn) (1.26.4)
Requirement already satisfied: pandas>=1.2 in c:\users\ok\documents\python_working_files\lib\site-packages (from seaborn) (2.2.2)
Requirement already satisfied: matplotlib!=3.6.1,>=3.4 in c:\users\ok\documents\python_working_files\lib\site-packages (from seaborn) (3.8.4)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.2.0)
Requirement already satisfied: cycler>=0.10 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (1.4.4)
Requirement already satisfied: packaging>=20.0 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (23.2)
Requirement already satisfied: pillow>=8 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (10.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in c:\users\ok\documents\python_working_files\lib\site-packages (from matplotlib!=3.6.1,>=3.4->seaborn) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\ok\documents\python_working_files\lib\site-packages (from pandas>=1.2->seaborn) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\ok\documents\python_working_files\lib\site-packages (from pandas>=1.2->seaborn) (2023.3)
Requirement already satisfied: six>=1.5 in c:\users\ok\documents\python_working_files\lib\site-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.4->seaborn) (1.16.0)
In [21]:
# Let's plot a Line Plot Using Plotly Express
fig = px.line(title = 'Amazon.com, Inc. (AMZN) Adjusted Closing Price [$]')
fig.add_scatter(x = stock_amzn['Date'], y = stock_amzn['Adj Close'], name = 'Adj Close')
stock_amzn
Out[21]:
Date Open High Low Close Adj Close Volume Monthly Return
0 2004-08-01 1.922500 2.041500 1.742500 1.907000 1.907000 3600448000 0.000000
1 2004-09-01 1.911500 2.214000 1.868000 2.043000 2.043000 3523538000 7.131620
2 2004-10-01 2.064500 2.135000 1.650000 1.706500 1.706500 4171368000 -16.470876
3 2004-11-01 1.711000 2.075000 1.707500 1.984000 1.984000 3369498000 16.261354
4 2004-12-01 2.000000 2.284000 1.899500 2.214500 2.214500 3520296000 11.617944
... ... ... ... ... ... ... ... ...
236 2024-04-01 180.789993 189.770004 166.320007 175.000000 175.000000 917021100 -2.982595
237 2024-05-01 181.639999 191.699997 173.869995 176.440002 176.440002 892301700 0.822858
238 2024-06-01 177.699997 199.839996 175.919998 193.250000 193.250000 813276000 9.527317
239 2024-07-01 193.490005 201.199997 192.820007 195.050003 195.050003 303090400 0.931438
240 2024-07-11 200.089996 200.269897 192.860001 195.050003 195.050003 43773883 0.000000

241 rows × 8 columns

In [24]:
# This function coded below performs interactive data visualization using Plotly Express
def plot_financial_data(amzn, title):
    
    fig = px.line(title = title)
    
    # For loop that plots all stock prices in the pandas dataframe pd
    # Note that index starts with 1 because we want to skip the date column
    
    for i in amzn.columns[1:]:
        fig.add_scatter(x = amzn['Date'], y = amzn[i], name = i)
        fig.update_traces(line_width = 5)
        fig.update_layout({'plot_bgcolor': "white"})

    fig.show()
In [26]:
# Plot High, Low, Open, Close and Adj Close
plot_financial_data(stock_amzn.drop(['Volume', 'Monthly Return'], axis = 1), 'Amazon.com, Inc. (AMZN) Stock Price [$]')
In [27]:
# Plot trading volume
plot_financial_data(stock_amzn.iloc[:,[0,5]], 'Amazon.com, Inc. (AMZN) Trading Volume')
In [28]:
# Plot % Monthly Returns
plot_financial_data(stock_amzn.iloc[:,[0,7]], 'Amazon.com, Inc. (AMZN) Percentage Monthly Return [%]')
In [43]:
# Plot the stocks monthly returns
plot_financial_data(stock_data, 'Percentage Monthly Returns [%]')
In [1]:
import pandas as pd
import yfinance as yf  # Import yfinance library for stock data
import numpy as np

# Define a list of stock tickers we are interested in.
tickers = ['ADBE', 'AMZN', 'CAT', 'DE', 'EXC', 'JNJ', 'JPM', 'MSFT', 'PFE', 'PG']

# Create an empty DataFrame to store adjusted close prices
adj_close_data = pd.DataFrame()

# Download the adjusted close prices for each stock
for ticker in tickers:
    # Fetch the data from Yahoo Finance
    data = yf.download(ticker, start='2004-01-01', end='2024-01-01', progress=False)
    
    # Select only the Adjusted Close prices and rename the column to the ticker symbol
    adj_close_data[ticker] = data['Adj Close']

print("Adjusted Close Prices DataFrame:")
print(adj_close_data.head())

# Calculate the percentage monthly returns
monthly_returns_df = adj_close_data.resample('M').ffill().pct_change() * 100  # Resample to monthly frequency, forward-fill missing data, calculate percentage change

# Replace NaN values with 0 (if any, but there shouldn't be any after pct_change)
monthly_returns_df.replace(np.nan, 0, inplace=True)

print("\nMonthly Returns DataFrame:")
print(monthly_returns_df.head())
Adjusted Close Prices DataFrame:
                 ADBE    AMZN        CAT         DE        EXC        JNJ  \
Date                                                                        
2004-01-02  19.518396  2.5950  24.635588  21.555693  10.938865  29.117960   
2004-01-05  19.867832  2.6635  25.306253  21.585871  10.988595  29.287048   
2004-01-06  19.887802  2.6515  24.859148  20.985796  10.909038  29.157417   
2004-01-07  19.694548  2.5950  24.587902  21.522177  10.975337  29.168690   
2004-01-08  18.975481  2.5120  24.397137  21.367962  10.976991  29.303968   

                  JPM       MSFT        PFE         PG  
Date                                                    
2004-01-02  20.959133  17.105961  15.121106  28.238073  
2004-01-05  20.919071  17.535954  15.525189  28.377855  
2004-01-06  21.445629  17.598263  15.516676  28.235224  
2004-01-07  21.760416  17.579573  15.648537  27.984182  
2004-01-08  22.132431  17.548412  15.503916  28.278000  

Monthly Returns DataFrame:
                ADBE       AMZN       CAT        DE       EXC       JNJ  \
Date                                                                      
2004-01-31  0.000000   0.000000  0.000000  0.000000  0.000000  0.000000   
2004-02-29 -2.741515 -14.662695 -3.046225  2.603812  1.072320  1.362429   
2004-03-31  5.537608   0.627760  4.382843  8.353766  2.576653 -5.917244   
2004-04-30  5.597969   0.739373 -1.244059 -1.832327 -2.802329  6.526017   
2004-05-31  7.542179  11.238527 -3.061856 -3.439193  0.365505  3.647175   

                 JPM      MSFT       PFE        PG  
Date                                                
2004-01-31  0.000000  0.000000  0.000000  0.000000  
2004-02-29  5.476989 -4.050699  0.503663  1.414769  
2004-03-31  2.267181 -6.030919 -4.365555  2.311918  
2004-04-30 -9.642520  4.813523  2.025696  1.352106  
2004-05-31 -2.021284  0.382667 -0.704461  1.918878  
C:\Users\ok\AppData\Local\Temp\ipykernel_1956\1710146094.py:23: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  monthly_returns_df = adj_close_data.resample('M').ffill().pct_change() * 100  # Resample to monthly frequency, forward-fill missing data, calculate percentage change
In [2]:
# Basic Analysis
# Calculate and print the average monthly return for each stock
average_monthly_returns = monthly_returns_df.mean()
print("\nAverage Monthly Returns:")
print(average_monthly_returns)

# Calculate and print the standard deviation of the monthly returns for each stock
std_deviation_monthly_returns = monthly_returns_df.std()
print("\nStandard Deviation of Monthly Returns:")
print(std_deviation_monthly_returns)
Average Monthly Returns:
ADBE    1.832078
AMZN    2.271867
CAT     1.476629
DE      1.569282
EXC     0.631864
JNJ     0.771122
JPM     1.148342
MSFT    1.498255
PFE     0.424252
PG      0.771014
dtype: float64

Standard Deviation of Monthly Returns:
ADBE     8.742673
AMZN    10.639783
CAT      9.142596
DE       8.218994
EXC      5.450420
JNJ      4.204065
JPM      7.789727
MSFT     6.490790
PFE      6.052442
PG       4.457197
dtype: float64
In [3]:
import matplotlib.pyplot as plt
import seaborn as sns

# Plot the percentage monthly returns
plt.figure(figsize=(12, 8))
monthly_returns_df.plot(title='Percentage Monthly Returns for Stocks', xlabel='Date', ylabel='Monthly Return (%)', marker='o')
plt.legend(tickers, loc='best')
plt.show()

# Plot the heatmap of the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(monthly_returns_df.corr(), annot=True, cmap='coolwarm', vmin=-1, vmax=1, center=0)
plt.title('Correlation Heatmap of Monthly Returns')
plt.show()
<Figure size 1200x800 with 0 Axes>
No description has been provided for this image
No description has been provided for this image
In [7]:
adj_close_data.reset_index(inplace=True)
In [8]:
monthly_returns_df = adj_close_data.set_index('Date').resample('M').ffill().pct_change() * 100  # Resample to monthly frequency, forward-fill missing data, calculate percentage change
monthly_returns_df.replace(np.nan, 0, inplace=True)  # Replace NaN values with 0
C:\Users\ok\AppData\Local\Temp\ipykernel_1956\2594310399.py:1: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  monthly_returns_df = adj_close_data.set_index('Date').resample('M').ffill().pct_change() * 100  # Resample to monthly frequency, forward-fill missing data, calculate percentage change
In [9]:
monthly_returns_df()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 monthly_returns_df()

TypeError: 'DataFrame' object is not callable
In [10]:
monthly_returns_df
Out[10]:
ADBE AMZN CAT DE EXC JNJ JPM MSFT PFE PG
Date
2004-01-31 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2004-02-29 -2.741515 -14.662695 -3.046225 2.603812 1.072320 1.362429 5.476989 -4.050699 0.503663 1.414769
2004-03-31 5.537608 0.627760 4.382843 8.353766 2.576653 -5.917244 2.267181 -6.030919 -4.365555 2.311918
2004-04-30 5.597969 0.739373 -1.244059 -1.832327 -2.802329 6.526017 -9.642520 4.813523 2.025696 1.352106
2004-05-31 7.542179 11.238527 -3.061856 -3.439193 0.365505 3.647175 -2.021284 0.382667 -0.704461 1.918878
... ... ... ... ... ... ... ... ... ... ...
2023-08-31 2.411345 3.239080 6.018774 -4.343588 -3.297341 -2.791216 -7.362619 -2.222562 -1.885761 -1.254008
2023-09-30 -8.838994 -7.890727 -2.891901 -7.842950 -5.807569 -3.667743 -0.895250 -3.664287 -6.246478 -5.494355
2023-10-31 4.345951 4.696345 -16.769947 -3.185119 3.043130 -4.757630 -3.402213 7.081556 -7.868551 3.508122
2023-11-30 14.838554 9.767827 10.913513 -0.260012 -0.178091 5.095414 12.239306 12.294541 1.049772 2.326201
2023-12-31 -2.358396 4.004385 17.928364 10.133882 -6.777457 1.344897 8.982584 -0.757439 -5.513621 -4.546648

240 rows × 10 columns

In [11]:
# Insert the Date column at the start of the monthly_returns_df DataFrame
monthly_returns_df.insert(0, 'Date', monthly_returns_df.index)  # Use the index as the Date
In [15]:
# Define a function that performs interactive data visualization using Plotly Express
import plotly.express as px 
def plot_financial_data(df, title):
    
    fig = px.line(title = title)
    
    # For loop that plots all stock prices in the pandas dataframe df
    # Note that index starts with 1 because we want to skip the date column
    
    for i in df.columns[1:]:
        fig.add_scatter(x = df['Date'], y = df[i], name = i)
        fig.update_traces(line_width = 5)
        fig.update_layout({'plot_bgcolor': "white"})

    fig.show()
In [16]:
# Plot closing prices using plotly Express. Note that we used the same pre-defined function "plot_financial_data"
plot_financial_data(monthly_returns_df, 'Adjusted Closing Prices [$]')
In [17]:
# Plot the stocks daily returns
plot_financial_data(monthly_returns_df, 'Percentage Monthy Returns [%]')
In [18]:
# Plot histograms for stocks monthly returns using plotly express
fig = px.histogram(monthly_returns_df.drop(columns = ['Date']))
fig.update_layout({'plot_bgcolor': "white"})
In [19]:
# Plot a heatmap showing the correlations between monthly returns
# Strong positive correlations between Catterpillar and John Deere - both into heavy equipment and machinery
# MSFT and Google - both into Tech and Cloud Computing
plt.figure(figsize = (10, 8))
sns.heatmap(monthly_returns_df.drop(columns = ['Date']).corr(), annot = True);
No description has been provided for this image
In [20]:
# Plot the Pairplot between stocks monthly returns
sns.pairplot(monthly_returns_df);
No description has been provided for this image
In [21]:
# Function to scale stock prices based on their initial starting price
# The objective of this function is to set all prices to start at a value of 1 
def price_scaling(raw_prices_df):
    scaled_prices_df = raw_prices_df.copy()
    for i in raw_prices_df.columns[1:]:
          scaled_prices_df[i] = raw_prices_df[i]/raw_prices_df[i][0]
    return scaled_prices_df
In [22]:
price_scaling(adj_close_data)
Out[22]:
Date ADBE AMZN CAT DE EXC JNJ JPM MSFT PFE PG
0 2004-01-02 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2004-01-05 1.017903 1.026397 1.027223 1.001400 1.004546 1.005807 0.998089 1.025137 1.026723 1.004950
2 2004-01-06 1.018926 1.021773 1.009075 0.973562 0.997273 1.001355 1.023212 1.028780 1.026160 0.999899
3 2004-01-07 1.009025 1.000000 0.998064 0.998445 1.003334 1.001742 1.038231 1.027687 1.034880 0.991009
4 2004-01-08 0.972184 0.968015 0.990321 0.991291 1.003485 1.006388 1.055980 1.025865 1.025316 1.001414
... ... ... ... ... ... ... ... ... ... ... ...
5028 2023-12-22 30.676188 59.121386 11.689238 18.202732 3.157575 5.256253 7.848499 21.817856 1.822835 5.079738
5029 2023-12-26 30.651084 59.117534 11.901397 18.406860 3.181754 5.279244 7.894915 21.822518 1.823477 5.102816
5030 2023-12-27 30.539395 59.090557 12.001639 18.423833 3.171008 5.286345 7.942268 21.788152 1.836314 5.107011
5031 2023-12-28 30.510704 59.105974 11.951719 18.381937 3.210411 5.294121 7.984465 21.858628 1.847867 5.095472
5032 2023-12-29 30.566034 58.551060 11.903007 18.410020 3.214888 5.299531 7.975088 21.902896 1.847867 5.123794

5033 rows × 11 columns

In [23]:
# Let's create an array that holds random portfolio weights
# Note that portfolio weights must add up to 1 
import random

def generate_portfolio_weights(n):
    weights = []
    for i in range(n):
        weights.append(random.random())
        
    # let's make the sum of all weights add up to 1
    weights = weights/np.sum(weights)
    return weights
In [25]:
# Call the function (Run this cell multiple times to generate different outputs)
weights = generate_portfolio_weights(10)
print(weights)
[0.07813861 0.1442829  0.04616586 0.14572471 0.00682053 0.1134458
 0.13514778 0.1435415  0.08587131 0.10086099]
In [26]:
# Let's define the "weights" list similar to the slides
weights = [0.07813861, 0.1442829,  0.04616586, 0.14572471, 0.00682053, 0.1134458, 0.13514778, 0.1435415,  0.08587131, 0.10086099]
weights
Out[26]:
[0.07813861,
 0.1442829,
 0.04616586,
 0.14572471,
 0.00682053,
 0.1134458,
 0.13514778,
 0.1435415,
 0.08587131,
 0.10086099]
In [27]:
# Let's display "adj_close_data" Pandas DataFrame
adj_close_data
Out[27]:
Date ADBE AMZN CAT DE EXC JNJ JPM MSFT PFE PG
0 2004-01-02 19.518396 2.595000 24.635588 21.555693 10.938865 29.117960 20.959133 17.105961 15.121106 28.238073
1 2004-01-05 19.867832 2.663500 25.306253 21.585871 10.988595 29.287048 20.919071 17.535954 15.525189 28.377855
2 2004-01-06 19.887802 2.651500 24.859148 20.985796 10.909038 29.157417 21.445629 17.598263 15.516676 28.235224
3 2004-01-07 19.694548 2.595000 24.587902 21.522177 10.975337 29.168690 21.760416 17.579573 15.648537 27.984182
4 2004-01-08 18.975481 2.512000 24.397137 21.367962 10.976991 29.303968 22.132431 17.548412 15.503916 28.278000
... ... ... ... ... ... ... ... ... ... ... ...
5028 2023-12-22 598.750000 153.419998 287.971252 392.372498 34.540291 153.051361 164.497726 373.215393 27.563284 143.442017
5029 2023-12-26 598.260010 153.410004 293.197906 396.772614 34.804779 153.720825 165.470566 373.295135 27.572990 144.093689
5030 2023-12-27 596.080017 153.339996 295.667419 397.138489 34.687229 153.927582 166.463058 372.707275 27.767097 144.212158
5031 2023-12-28 595.520020 153.380005 294.437622 396.235382 35.118248 154.154007 167.347473 373.912842 27.941795 143.886322
5032 2023-12-29 596.599976 151.940002 293.237579 396.840729 35.167229 154.311539 167.150925 374.670074 27.941795 144.686081

5033 rows × 11 columns

In [28]:
# Scale stock prices using the "price_scaling" function that we defined earlier (make all stock values start at 1)
portfolio_df = adj_close_data.copy()
scaled_df = price_scaling(portfolio_df)
scaled_df
Out[28]:
Date ADBE AMZN CAT DE EXC JNJ JPM MSFT PFE PG
0 2004-01-02 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
1 2004-01-05 1.017903 1.026397 1.027223 1.001400 1.004546 1.005807 0.998089 1.025137 1.026723 1.004950
2 2004-01-06 1.018926 1.021773 1.009075 0.973562 0.997273 1.001355 1.023212 1.028780 1.026160 0.999899
3 2004-01-07 1.009025 1.000000 0.998064 0.998445 1.003334 1.001742 1.038231 1.027687 1.034880 0.991009
4 2004-01-08 0.972184 0.968015 0.990321 0.991291 1.003485 1.006388 1.055980 1.025865 1.025316 1.001414
... ... ... ... ... ... ... ... ... ... ... ...
5028 2023-12-22 30.676188 59.121386 11.689238 18.202732 3.157575 5.256253 7.848499 21.817856 1.822835 5.079738
5029 2023-12-26 30.651084 59.117534 11.901397 18.406860 3.181754 5.279244 7.894915 21.822518 1.823477 5.102816
5030 2023-12-27 30.539395 59.090557 12.001639 18.423833 3.171008 5.286345 7.942268 21.788152 1.836314 5.107011
5031 2023-12-28 30.510704 59.105974 11.951719 18.381937 3.210411 5.294121 7.984465 21.858628 1.847867 5.095472
5032 2023-12-29 30.566034 58.551060 11.903007 18.410020 3.214888 5.299531 7.975088 21.902896 1.847867 5.123794

5033 rows × 11 columns

In [30]:
# Use enumerate() method to obtain the stock names along with a counter "i" (0, 1, 2, 3,..etc.)
# This counter "i" will be used as an index to access elements in the "weights" list
initial_investment = 10000000
for i, stock in enumerate(scaled_df.columns[1:]):
    portfolio_df[stock] = weights[i] * scaled_df[stock]  * initial_investment
portfolio_df.round(2)
Out[30]:
Date ADBE AMZN CAT DE EXC JNJ JPM MSFT PFE PG
0 2004-01-02 781386.10 1442829.00 461658.60 1457247.10 68205.30 1134458.00 1351477.80 1435415.00 858713.10 1008609.90
1 2004-01-05 795375.17 1480915.26 474226.54 1459287.25 68515.38 1141045.81 1348894.54 1471497.04 881660.60 1013602.64
2 2004-01-06 796174.64 1474243.17 465848.01 1418719.91 68019.32 1135995.29 1382847.82 1476725.60 881177.13 1008508.12
3 2004-01-07 788438.02 1442829.00 460765.00 1454981.30 68432.71 1136434.47 1403145.78 1475157.26 888665.37 999541.40
4 2004-01-08 759651.40 1396680.74 457190.15 1444555.78 68443.02 1141705.03 1427133.89 1472542.50 880452.49 1010036.00
... ... ... ... ... ... ... ... ... ... ... ...
5028 2023-12-22 23969947.03 85302050.13 5396437.34 26525878.47 215363.38 5962998.13 10607071.57 31317677.99 1565292.43 5123474.12
5029 2023-12-26 23950331.10 85296493.15 5494382.22 26823343.10 217012.50 5989080.97 10669801.78 31324369.43 1565843.65 5146750.61
5030 2023-12-27 23863058.77 85257568.84 5540659.66 26848077.67 216279.56 5997136.36 10733799.27 31275040.23 1576866.81 5150982.10
5031 2023-12-28 23840640.21 85279813.73 5517613.87 26787024.21 218967.02 6005958.06 10790827.71 31376203.10 1586787.73 5139343.86
5032 2023-12-29 23883874.43 84479167.39 5495125.67 26827947.95 219272.42 6012095.62 10778153.96 31439744.88 1586787.73 5167909.72

5033 rows × 11 columns

In [31]:
# Assume that we have $10,000,000 that we would like to invest in one or more of the selected stocks
# Let's create a function that receives the following arguments: 
      # (1) Stocks closing prices
      # (2) Random weights 
      # (3) Initial investment amount
# The function will return a DataFrame that contains the following:
      # (1) Daily value (position) of each individual stock over the specified time period
      # (2) Total daily value of the portfolio 
      # (3) Percentage daily return 

def asset_allocation(df, weights, initial_investment):
    portfolio_df = df.copy()

    # Scale stock prices using the "price_scaling" function that we defined earlier (Make them all start at 1)
    scaled_df = price_scaling(df)
  
    for i, stock in enumerate(scaled_df.columns[1:]):
        portfolio_df[stock] = scaled_df[stock] * weights[i] * initial_investment

    # Sum up all values and place the result in a new column titled "portfolio value [$]" 
    # Note that we excluded the date column from this calculation
    portfolio_df['Portfolio Value [$]'] = portfolio_df[portfolio_df != 'Date'].sum(axis = 1, numeric_only = True)
            
    # Calculate the portfolio percentage monthly return and replace NaNs with zeros
    portfolio_df['Portfolio monthly Return [%]'] = portfolio_df['Portfolio Value [$]'].pct_change(1) * 100 
    portfolio_df.replace(np.nan, 0, inplace = True)
    
    return portfolio_df
In [32]:
portfolio_df
Out[32]:
Date ADBE AMZN CAT DE EXC JNJ JPM MSFT PFE PG
0 2004-01-02 7.813861e+05 1.442829e+06 4.616586e+05 1.457247e+06 68205.300000 1.134458e+06 1.351478e+06 1.435415e+06 8.587131e+05 1.008610e+06
1 2004-01-05 7.953752e+05 1.480915e+06 4.742265e+05 1.459287e+06 68515.375148 1.141046e+06 1.348895e+06 1.471497e+06 8.816606e+05 1.013603e+06
2 2004-01-06 7.961746e+05 1.474243e+06 4.658480e+05 1.418720e+06 68019.323888 1.135995e+06 1.382848e+06 1.476726e+06 8.811771e+05 1.008508e+06
3 2004-01-07 7.884380e+05 1.442829e+06 4.607650e+05 1.454981e+06 68432.709848 1.136434e+06 1.403146e+06 1.475157e+06 8.886654e+05 9.995414e+05
4 2004-01-08 7.596514e+05 1.396681e+06 4.571901e+05 1.444556e+06 68443.020712 1.141705e+06 1.427134e+06 1.472542e+06 8.804525e+05 1.010036e+06
... ... ... ... ... ... ... ... ... ... ... ...
5028 2023-12-22 2.396995e+07 8.530205e+07 5.396437e+06 2.652588e+07 215363.381961 5.962998e+06 1.060707e+07 3.131768e+07 1.565292e+06 5.123474e+06
5029 2023-12-26 2.395033e+07 8.529649e+07 5.494382e+06 2.682334e+07 217012.501764 5.989081e+06 1.066980e+07 3.132437e+07 1.565844e+06 5.146751e+06
5030 2023-12-27 2.386306e+07 8.525757e+07 5.540660e+06 2.684808e+07 216279.562272 5.997136e+06 1.073380e+07 3.127504e+07 1.576867e+06 5.150982e+06
5031 2023-12-28 2.384064e+07 8.527981e+07 5.517614e+06 2.678702e+07 218967.022931 6.005958e+06 1.079083e+07 3.137620e+07 1.586788e+06 5.139344e+06
5032 2023-12-29 2.388387e+07 8.447917e+07 5.495126e+06 2.682795e+07 219272.424297 6.012096e+06 1.077815e+07 3.143974e+07 1.586788e+06 5.167910e+06

5033 rows × 11 columns

In [35]:
# Now let's put this code in a function and generate random weights
# Let's obtain the number of stocks under consideration (note that we ignored the "Date" column) 
n = len(adj_close_data.columns)-1

# Let's generate random weights 
print('Number of stocks under consideration = {}'.format(n))
weights = generate_portfolio_weights(n).round(6)
print('Portfolio weights = {}'.format(weights))

# Let's test out the "asset_allocation" function
portfolio_df = asset_allocation(adj_close_data, weights, 10000000)
portfolio_df.round(2)
Number of stocks under consideration = 10
Portfolio weights = [0.130612 0.140131 0.069716 0.137146 0.105677 0.035927 0.144462 0.084744
 0.068839 0.082745]
Out[35]:
Date ADBE AMZN CAT DE EXC JNJ JPM MSFT PFE PG Portfolio Value [$] Portfolio Daily Return [%]
0 2004-01-02 1306120.00 1401310.00 697160.00 1371460.00 1056770.00 359270.00 1444620.00 847440.00 688390.00 827450.00 9.999990e+06 0.00
1 2004-01-05 1329503.33 1438300.29 716139.10 1373380.05 1061574.29 361356.29 1441858.71 868742.11 706785.93 831545.98 1.012919e+07 1.29
2 2004-01-06 1330839.67 1431820.19 703486.51 1335200.88 1053888.49 359756.84 1478152.01 871828.95 706398.36 827366.50 1.009874e+07 -0.30
3 2004-01-07 1317907.58 1401310.00 695810.55 1369327.58 1060293.48 359895.93 1499848.87 870903.03 712401.33 820010.32 1.010771e+07 0.09
4 2004-01-08 1269789.53 1356489.71 690412.10 1359515.81 1060453.23 361565.05 1525490.21 869359.32 705817.45 828619.95 1.002751e+07 -0.79
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5028 2023-12-22 40066782.89 82847389.31 8149269.30 24964318.88 3336831.03 1888413.97 11338098.00 18489324.02 1254821.49 4203229.28 1.965385e+08 0.01
5029 2023-12-26 40033994.02 82841992.24 8297177.85 25244271.98 3362382.42 1896674.11 11405151.50 18493274.51 1255263.38 4222325.00 1.970525e+08 0.26
5030 2023-12-27 39888114.62 82804188.02 8367062.35 25267550.44 3351026.28 1899225.16 11473559.61 18464151.55 1264100.13 4225796.46 1.970048e+08 -0.02
5031 2023-12-28 39850641.05 82825792.78 8332260.43 25210091.15 3392665.68 1902018.90 11534518.39 18523876.06 1272053.27 4216248.60 1.970602e+08 0.03
5032 2023-12-29 39922908.88 82048185.92 8298300.55 25248605.75 3397397.56 1903962.59 11520971.17 18561389.84 1272053.27 4239683.64 1.964135e+08 -0.33

5033 rows × 13 columns

In [38]:
# Plot the portfolio percentage monthly return
plot_financial_data(portfolio_df[['Date', 'Portfolio Daily Return [%]']], 'Portfolio Percentage monthly Return [%]')

# Plot each stock position in our portfolio over time
# This graph shows how our initial investment in each individual stock grows over time
plot_financial_data(portfolio_df.drop(['Portfolio Value [$]', 'Portfolio Daily Return [%]'], axis = 1), 'Portfolio positions [$]')

# Plot the total daily value of the portfolio (sum of all positions)
plot_financial_data(portfolio_df[['Date', 'Portfolio Value [$]']], 'Total Portfolio Value [$]')
In [ ]: